Word matching using single closed contours for indexing handwritten historical documents
Identifieur interne : 000F18 ( Main/Exploration ); précédent : 000F17; suivant : 000F19Word matching using single closed contours for indexing handwritten historical documents
Auteurs : Tomasz Adamek [Irlande (pays)] ; Noel E. O'Connor [Irlande (pays)] ; Alan F. Smeaton [Irlande (pays)]Source :
- International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2007.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000322
- to stream PascalFrancis, to step Curation: 000464
- to stream PascalFrancis, to step Checkpoint: 000260
- to stream Main, to step Merge: 000F31
- to stream Main, to step Curation: 000F18
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0469287</idno>
<date when="2007">2007</date>
<idno type="stanalyst">PASCAL 07-0469287 INIST</idno>
<idno type="RBID">Pascal:07-0469287</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000322</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000464</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000260</idno>
<idno type="wicri:doubleKey">1433-2833:2007:Adamek T:word:matching:using</idno>
<idno type="wicri:Area/Main/Merge">000F31</idno>
<idno type="wicri:Area/Main/Curation">000F18</idno>
<idno type="wicri:Area/Main/Exploration">000F18</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Annotation</term>
<term>Character recognition</term>
<term>Document analysis</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Indexing</term>
<term>Manuscript character</term>
<term>Multiscale method</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Performance evaluation</term>
<term>Segmentation</term>
<term>Video signal</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Indexation</term>
<term>Caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Signal vidéo</term>
<term>Evaluation performance</term>
<term>Annotation</term>
<term>Extraction forme</term>
<term>Méthode échelle multiple</term>
<term>Segmentation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.</div>
</front>
</TEI>
<affiliations><list><country><li>Irlande (pays)</li>
</country>
</list>
<tree><country name="Irlande (pays)"><noRegion><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
</noRegion>
<name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F18 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F18 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:07-0469287 |texte= Word matching using single closed contours for indexing handwritten historical documents }}
This area was generated with Dilib version V0.6.32. |